Summarize and Generate to Back-translate

Official code release of our work, Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages.

Setup • Train • Evaluation • License • Citation

Setup

Setting up a conda environment is recommended to run experiments. We assume anaconda is installed. The additional requirements (noted in requirements.txt) can be installed by running the following script:

bash install_env.sh

Then build tree_sitter library for Java and Python languages by running:

python build.py

Finally, download the pre-trained PLBART checkpoints.

cd plbart
bash download.sh

There are two model sizes, so we can perform experiments with MODEL_SIZE=base|large.

Train

Step1. Summarization and Generation

cd sumgen
bash run.sh GPU_ID [MODEL_SIZE]

Step2. Back-translation

cd plbart
bash train.sh GPU_ID [MODEL_SIZE]

Evaluation

Evaluate SumGen model

cd sumgen/evaluation
bash decode.sh GPU_ID SOURCE TARGET MODEL_SIZE BEAM_SIZE
bash evaluate.sh SAVE_DIR SOURCE TARGET

For example, run the following commands to get results with default settings.

cd sumgen/evaluation
# to evaluate base model
bash decode.sh 0 java python base 10
bash evaluate.sh base_java_python_b10 java python
# to evaluate large model
bash decode.sh 0 java python large 10
bash evaluate.sh large_java_python_b10 java python

Evaluate PLBART

cd scripts
bash run.sh GPU_ID

Results

License

Contents of this repository is under the MIT license. The license applies to the pre-trained and fine-tuned models as well.

Citation

If you use any of the datasets, models or code modules, please cite the following paper:

@article{ahmad2022sumgen,
  author    = {Wasi Uddin Ahmad and Saikat Chakraborty and Baishakhi Ray and Kai-Wei Chang},
  title     = {Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages},
  journal   = {CoRR},
  volume    = {abs/2205.11116},
  year      = {2022},
  url       = {https://arxiv.org/abs/2205.11116},
  eprinttype = {arXiv},
  eprint    = {2205.11116}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
evaluation		evaluation
lang_processors		lang_processors
plbart		plbart
scripts		scripts
sentencepiece		sentencepiece
source		source
sumgen		sumgen
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.py		build.py
install_env.sh		install_env.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarize and Generate to Back-translate

Setup

Train

Step1. Summarization and Generation

Step2. Back-translation

Evaluation

Evaluate SumGen model

Evaluate PLBART

Results

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Summarize and Generate to Back-translate

Setup

Train

Step1. Summarization and Generation

Step2. Back-translation

Evaluation

Evaluate SumGen model

Evaluate PLBART

Results

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages